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Abstract: In this study, the anticipatory Vowel-to-Vowel (V-to-V) coarticulatory effect in the Vowel-Consonant- 
Vowel (VCV) sequences is investigated. The subjects are twelve native speakers of standard Chinese, and the F 2 
offset value of the first vowel is analyzed. Results show that, in the trans-segment context, anticipatory 
coarticulation exists in Chinese. The coarticulatory effect in the context of labial is greater than that of alveolar, 
which is in line with the degrees of articulatory constraint model. The articulatory strength is great for 
aspirated consonants, so the coarticulatory effect is great in the context of aspirated stops. 
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I. INTRODUCTION 

This study deals with the acoustic signaling of coarticulation, which refers to the articulatory 
modification of a speech sound under the influence of adjacent segments. For example, all else being equal, the 
back vowel [u] in 'two' is produced farther forward than the same vowel in 'who' due to the influence of the 
adjacent coronal consonant. Coarticulation effect may vary with its specific context or the phonological system 
of a language. In his classic spectrographic study of VCV sequences in three languages, Ohman [1] found that 
F 2 values of target vowels varied more due to vowel context in English and Swedish than in Russian. He 
attributed the coarticulatory differences to the languages' consonant systems, arguing that the requirements on 
the tongue body imposed by contrastive palatalization in Russian, but not in English or Swedish, restricted 
transconsonantal coarticulation in Russian. 

Consonant restrictions on V-to-V coarticulation have also been reported by Recasens [2], who found 
less V-to-V coarticulation across the velarized lateral of Catalan than across the 'clear' lateral of Spanish and 
German. He and his colleagues ascribed the coarticulatory differences to different lingual constraints for these 
laterals. Bladon and Al-Bamerni [3] originated the concept of 'coarticulatory resistance' that claimed phonetic 
segments possess inherent properties that limit the extent to which they can be influenced by neighboring 
segments. Using this concept within a coarticulatory approach to speech production, Recasens [4] developed the 
'degrees of articulatory constraint' (DAC) model to account for coarticulatory effects of both vowels and 
consonants. Recasens' model predicts that the more a specific region of the tongue is involved in the occlusion 
for the C, the more the C affects V, but the less it can be shaped by the vowel, and the less the transconsonantal 
V-to-V coarticulation. 

There have been a number of studies on the coarticulatory effect of segments in Chinese, including the 
analysis of the acoustic coarticulatory patterns of voiceless fricatives in CVCV [5], the study of vowel formant 
pattern and the coarticulation in the voiceless stop initial monosyllables [6], the acoustic study of intersyllabic 
anticipatory coarticulation of three places of ariculation of C2 in CVCV [7], vowel segmental coarticulation in 
read speech in Standard Chinese [8], and anticipatory coarticulation in V1#C2V2 sequences [9]. It is found that 
coarticulation exists in segment adjacent and trans-segmental contexts in Chinese. 

Coarticulation is a common phenomenon in languages, and it is believed that coarticulation affects the 
smoothness and naturalness of the synthesized speech in Text-to-Speech. Therefore, the naturalness of 
synthesized speech will be greatly improved if speech coarticulation is properly solved [10]. The research 
presented in this paper aims to investigate the V-to-V coarticulation in VCV sequences in Chinese. 
Coarticulation may be generally classified as carry-over (left-to-right) or anticipatory (right-to-left) ones [11], 
and the present study will focus on anticipatory coarticulation. 

II. Methodology 

2.1 Speakers, stimuli and recording 

Twelve native speakers of Standard Chinese, six male and six female, participated in the recording. 
Regarding the stimuli, disyllabic words, in the form of CiVi.C 2 V 2 , are used, with V 2 providing the 'changing' 
vowel context, Vj the 'fixed' vowel, which is designed for the changing vowels to affect the fixed vowel. The 
fixed vowel is /a/, and for the changing vowel context, vowels III vs. /u/ are used to influence the offset of the F 2 
frequencies of the fixed vowel. The intervocalic consonant C 2 includes lb, p, d, t/, two unaspirate stops Ibl, Idl and 
two aspirated ones /p/, hi. All the words used are in normal stress, without neutral tone syllables. An example of a 
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pair of words used are 'dadi' and 'dadu', which mean 'archenemy' and 'to bet' respectively. Two sets of words of 
identical combinations are used, so there are 16 words in the word list (4 stops x 2 changing vowel contexts x 2 
sets). 

Recording was done in a sound-treated room, and the acoustic data were recorded directly into the 
computer at a sampling rate of 16 kHz using the recording software of Cool Edit Pro. The speakers were asked 
to read the word list three times, in random order for each repetition, in normal pace, so each speaker produced 
48 tokens: 16 words x 3 repetitions. In total, 576 tokens were acoustically analyzed (48 tokens x 12 speakers). 

2.2. Procedure and measurements 

1) F 2 offset value: This study aims at investigating the extent of V-V coarticulation in VCV sequences, 
and vowel formant is examined. Formant values are extracted using Praat [12], and the effect of trans - 
consonantal coarticulation is analyzed by measuring the F 2 offset value of the fixed vowel. F 2 offset frequency is 
taken at the offset of the fixed vowel Vi. 

Fig. 1 displays the waveform and spectrogram of 'dadi' (archenemy), with C 2 as an unaspirated, 
alveolar consonant. For the purpose of this study, F 2 offset value is taken at the offset point of the vowel, that is, 
point 'A' on the graph. 




da A di 

Fig. 1 Waveform and spectrogram of 'dadi' 



2) F 2 delta: In order to compare the extent of coarticulatory effects under various consonant contexts, 
besides the F 2 offset values, their differences caused by the changing V 2 contexts are also calculated. 
Coarticulation effects due to changing V 2 contexts are indexed by F 2 delta values obtained at the Vj F 2 offset, 
and F 2 delta (Hz) is derived by computing the difference in offset frequencies of the fixed vowel in each 
sequence pair, as is shown in formula ( 1 ) 

AF 2 =F 2i -F 2u (1) 

In (1), F 2i and F 2u refer to the Vi F 2 offset values preceding vowel HI and /u/ respectively, and AF 2 is the 
F 2 delta at the offset of Vi. 

Fig. 2 displays the F 2 contours of the sequence pair 'dadi' (archenemy) and 'dadu' (to bet), with the 
contour of 'dadi' in solid line, and that of 'dadu' in dashed line. In this sequence pair, for the changing vowel 
context, F 2 of lil is high and that of /u/ is low. If Vi F 2 offsets differ in this pair, then it is reasonable to attribute 
the frequency difference to the high vs. low F 2 contexts in V 2 . The greater the F 2 delta value is, the greater the 
coarticulatory effect of V 2 is on Vj. 



(In Of 




Fig. 2 F 2 contours of the sequence pair 'dadi' and 'dadu', with the former in solid line, and the latter in 

dashed line 

A repeated measures ANOVA was performed with two within-subjects factors — aspiration 
(unaspirated, aspirated) and place of articulation (labial, alveolar). 

III. Results 

3.1 F 2 value 

Fig. 3 graphs the F 2 offset values for male speakers (Fig. 3a) and female speakers (Fig. 3b), broken 
down by the contexts of aspiration, place of articulation and changing vowels. The changing vowel contexts are 
indicated by HI and /u/, which refer to changing vowel context of lil and lul respectively. Repeated measures 
ANOVA results show that, as far as main effect is concerned, there are significant effects for all the three 
factors: place of articulation: F(l, 71) = 1084, p < 0.001; aspiration: F(l, 71) = 7.39, p = 0.008; changing vowel 
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context: F(l, 71) = 726, p < 0.001. The V, F 2 offset values are comparatively great in the context of alveolars, 
unaspirated consonants, and preceding vowel hi. 



| □ /i/ a /li/ | 
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(a) F 2 values of male speakers 
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(b) F 2 values of female speakers 



Fig. 3 F 2 offset values for male speakers (a) and female speakers (b), broken down by the contexts of 
aspiration, place of articulation and changing vowel 

For the purpose of elaborating the coarticulatory effect under various consonant contexts in detail, F 2 
delta value will be analyzed in the next section. To be specific, the extent under the contexts of place of 
articulation and aspiration will be presented. 

3.2 F 2 delta value 

Table 1 presents the F 2 delta means and significance results for the main effects. From Table 1 it can be 
seen that, in terms of overall main effect, there is significant effect for both place of articulation and aspiration, 
with the effect in the labial contexts greater than that of alveolar, and the effect of aspirated stop contexts greater 
than that of the unaspirated ones. 



Table 1 F 2 delta means (in Hz) and statistical results for the main effects 







Mean 


Statistical result 


Place 

articulation 
Aspiration 


of Labial 
Alveolar 
Unaspirated 
Aspirated 


347.9 
275.6 
281.7 
341.6 


F(l, 71) = 22.0, p < 
0.001 

F(l, 71) = 16.3, p < 
0.001 



When interactive effects are examined, it is shown that the place of articulation x aspiration interaction 
is significant: F(l, 71) = 13.3, p = 0.001, which is attributable to the inconsistent effect of place of articulation 
under different contexts of aspiration. In the next subsection the effects of the factors will be described in detail 
to help inform and elaborate on the main effects. 

3.2.1 The effect of place of articulation: 

1) The unaspirated stop contexts 

Fig. 4 shows the F 2 delta under the effects of place of articulation and aspiration. Result from repeated 
measures ANOVA shows that, in the unaspirated stop contexts, the effect of place of articulation is significant: 
F(l, 71) = 50.6, p < 0.001, with the extent of labial context exceeding that of alveolar context. 
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Fig. 4 F 2 delta under the effects of place of articulation and aspiration 

2) The aspirated stop contexts 

When the intervocalic stops are aspirated, there are no significant difference between the two place of 
articulation contexts: F(l, 71) = 1.24, p = 0.27. That is, there is no significant difference in the coarticulatory 
effect of the contexts of place of articulation in the case of aspirated stop context. 

3.2.2 The effect of aspiration: 

1) The context of labials 

It is shown from repeated measures ANOVA result that, in the context of labials, there is no significant 
difference between the unaspirated and aspirated stop contexts: F(l, 71) = 0.64, p = 0.428. In this context, there 
is no significant effect of aspiration on the trans -consonantal coarticulation. 

2) The context of alveolars: In the context of alveolar stops, the difference between unaspirated and aspirated 
stop contexts is significant: F(l, 71) = 22.8, p<0.001. The aspirated stop context exceeds the unaspirated one in 
the extent of coarticulation. 

IV. Discussion 

Analysis in the previous section shows that, when V! F 2 offset value is analyzed, the effect of the 
changing vowel context on it is significant, with F 2 offset value preceding vowel hi higher than that preceding 
/u/. This study aims at investigating the anticipatory coarticulatory effect of V 2 on Vi, with the changing vowels 
of V 2 as hi and /u/. As is mentioned above, the F 2 value of hi is high, while that of lul is low. The V! F 2 offset 
values are significantly different when preceding hi and lul. This implies that trans -consonantal anticipatory 
vowel to vowel effect exists in Chinese. 

In regard to the F 2 delta values, when main effects are examined, there is significant effect for place of 
articulation, with the effect in the labial contexts greater than that of alveolar. The DAC model [11] predicts that 
in VCV sequences, an increase in the degree of constraint for the consonant should yield an increase in the 
prominence of the C-to-V effects and a decrease in the strength of the V-to-C and V-to-V effect. According to 
the DAC model, the diversity in the involvement of the articulators in the production of different obstruents 
results in the variation of the degree of articulatory constrait. Obstruents, particularly alveolopalatals, that 
maximally engage the tongue dorsum for the occlusion gesture would reduce V effects, that is, stops like Idl and 
III exhibite reduced extents of V-V coarticulation. 

With respect to coarticulation, coarticulatory sensitivity, which is the magnitude and temporal extent of 
the coarticulatory effect at a given articulator, is shown to be inversely related to the degree of articulatory 
constrait: highly constrained phonetic segments are generally more resistant to coarticulation than those 
specified for a lower degree of articulatory constraint, and thus less sensitive to coarticulatory influence from 
neighboring segments. The model also predicts that coarticulatory dominance is positively related to the degree 
of articulatory constraint: phonetic segments with high DAC value and coarticulation resistant usually have 
prominent coarticulatory effects on neighboring phonetic segments. 

In this study, it is shown that the coarticulatory pattern is consistent to the DAC model: as far as main 
effect is concerned, the coarticulatory effect is greater in the context of labials than that of alveolar. The 
coarticulatory resistant for alveolars is comparatively great, which is in line with the DAC model. 

As for the effect of aspiration, analysis in the previous section shows a general tendency for 
coarticulation effect to be greater for aspirated stops than the unaspirated ones, which is due to the high 
articulatory strength of aspirated obstruents. Generally speaking, phonetic segments can be classified into two 
groups: 'fords' and 'lenis', which refer to consonants that are produced with greater and lesser energy 
respectively, such as in energy applied, articulation, etc. Fortis and lenis were coined as less misleading terms to 
refer to consonantal contrasts in languages that do not employ actual vocal fold vibration in their voiced 
consonants, but instead involved amounts of articulatory strength. For example, in English there are fortis 
consonants, as in 'come' and 'put' that exhibit a longer stop closure and shorter preceding vowels than their 
lenis counterparts, as in 'grass' and 'bed'. In Chinese, aspirated consonants are fortis, while unaspirated ones are 
lenis. 
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As the aspirated consonants in Chinese is fortis, their articulatory strength is great. Generally speaking, 
consonants with high articulatory strength tend to exert great effect on the preceding vowels. In Chinese, 
consonants and vowels are combined into one unit: syllable. Syllables with consonants of high articulatory 
strength may exert great effect on the preceding vowel. As a result, the anticipatory coarticulatory effect is great 
when C 2 is aspirated consonant. 

Regarding the effect of place of articulation, results from the previous section show that, in the context 
of unaspirated consonants, coarticulatory effect in the trans-labial context is greater than that in the trans- 
alveolar context, which is in accordance to the DAC model. As for the effect of aspiration, when C 2 is alveolar, 
coarticulatory effect is comparatively great in the context of aspirated consonant, which is due to the high 
articulatory strength of the aspirated consonants. 

However, when C 2 is aspirated, there is no significant effect of place of articulation on coarticulation. 
The cause for this result comes from the effect of aspiration on the Vi F 2 offset values. In section 3.1, Fig. 3 
displays the Vi F 2 offset values, from which it is shown that Vj F 2 offset value is comparatively small when C 2 
is aspirated. Further analysis shows that when V 2 is vowel hi, there is no significant effect of aspiration on the 
Vi F 2 offset values, labial: F(l, 71) = 0.47, p = 0.494; alveolar: F(l, 71) = 2.05, p = 0.157. Only when V 2 is /u/, 
there is significant effect of aspiration on the Vj F 2 offset values, labial: F(l, 71) = 6.03, p = 0.016; alveolar: 
F(l, 71) = 16.4, p < 0.001. The V! F 2 offset value is comparatively great in the context of unaspirated 
consonants. That is to say, when V 2 is IvJ, the Vj F 2 offset value is reduced in the context of aspirated 
consonants. 

The significant level of the effects are diverse under different contexts of place of articulation: for 
labial, it is comparatively low, with p = 0.016; for alveolar, it is high, with p < 0.001. The variability of Vi F 2 
offset value is great in the context of alveolar. The F 2 delta is the difference of V[ F 2 offset values between the 
contexts of subsequent vowels hi and /u/. In the context of alveolar, the Vi F 2 offset value is reduced when 
preceding vowel /u/, so the F 2 delta will increase. As a result, in the context of aspirated consonants, there is no 
significant difference between the contexts of alveolars and labials. 

It is also shown in the previous section that, when C 2 is labial, there in no significant effect of 
aspiration on the extent of coarticulation. As is mentioned above, the articulatory strength of aspirated consonant 
is high, but as the degree of articulatory constrait of labial is low, the effect of aspiration for labials on the Vi F 2 
offset value is small. When V 2 is hi, there is no significant effect of aspiration on the Vi F 2 offset value, while 
when V 2 is /u/, the significant level for the effect of aspiration on the V! F 2 offset value in the context of labial is 
low. As a result, when C 2 is labial, there in no significant effect of aspiration on the extent of coarticulation. 

V. Conclusion 

In this study, the V-to-V coarticulation in the VCV sequences is investigated, and it is found that there 
is significant difference between the Vj F 2 offset values in the contexts of subsequent vowels hi and lul, which 
means that trans-segmental anticipatory coarticulation exists in Chinese. As far as main effect is concerned, 
coarticulatory effect is greater in the context of labial than of alveolar, which is consistent to the DAC model. As 
the articulatory strength of aspirated consonant is high, the effect is great when C 2 is aspirated consonant. The 
V] F 2 offset value will be reduced in the context of aspirated consonants when preceding vowel lul, and this 
effect intensifies when C 2 is alveolar consonant. Therefore, in the context of aspirated consonants, there is no 
significant difference between the two place of articulation contexts. The effect get when C 2 is labial, so in this 
case there is no effect of aspiration on the extent of coarticulation. 

This study is significant in speech engineering. In speech synthesis, the effect of trans -consonantal 
coarticulation must be taken into consideration. The extent of coarticulation in the context of labial exceeds that 
of alveolar, and the extent of aspirated consonants exceeds that of unaspirated ones, so much attention should be 
paid in these contexts. However, in some case, the difference of coarticulatory effects can be neglected, as it 
disappears. Therefore, this study is helpful in speech engineering technology. 
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